← Back to Nebius Projects

Nebius
GlossaryDecoded

Nebius technical terms from AI infrastructure, cloud platforms, containers, and machine learning ops — explained in plain English with real-world analogies any business person can understand.

0Total Terms
0Shown
A–ZIndexed
A
TermOfficial DefinitionEasy AnalogyBusiness Purpose
A/B TestingA method of comparing two versions of a model or system in production to see which performs better on real traffic.Like trying two sales scripts with different customers to see which one closes more deals.Reduces guesswork and improves model quality, conversion, or user experience before a full rollout.
Accelerator NewSpecialized hardware—such as a GPU or TPU—designed to speed up specific compute tasks like matrix math used in AI training.Like swapping a regular kitchen knife for a professional mandoline slicer: the same task, done 100× faster.Cuts training and inference time dramatically, reducing cost and time-to-market for AI products.
Agent (AI) NewAn AI system that takes sequences of actions autonomously—browsing the web, calling tools, or writing code—to complete a goal.Like a personal assistant who doesn't just answer questions but actually books flights and sends emails on your behalf.Enables automation of complex multi-step business tasks that previously required human operators.
Apache SparkA distributed data processing engine for large-scale analytics and data engineering workloads.Like a warehouse crew splitting a huge sorting job across many workers at once.Speeds up large data preparation and analytics jobs that feed AI and reporting.
APIApplication Programming Interface: a defined way for software systems to send requests to and receive responses from another service.Like a restaurant menu and waiter: you order in a standard way and the kitchen delivers it.Lets teams connect apps, automate workflows, and use AI services without manual steps.
Attention Mechanism NewA component inside transformer models that lets the model focus on relevant parts of the input when generating each output token.Like a reader who highlights the most important sentences before answering a quiz question.Is the core innovation that makes large language models powerful enough for complex reasoning and generation.
AutoscalingThe automatic increase or decrease of computing resources based on workload demand.Like opening more checkout lanes when the store gets busy, then closing them when traffic drops.Helps control cost while maintaining performance during traffic spikes.
B
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Batch InferenceRunning model predictions on a large collection of data items as a job rather than responding one request at a time.Like grading a whole stack of tests overnight instead of answering one student at a time.Useful for offline scoring, reporting, content processing, and other non-real-time tasks.
Batch JobA non-interactive task that runs to completion and usually processes data or compute work in the background.Like dropping off laundry and picking it up when the cleaning is done.Ideal for scheduled processing, training, evaluation, and large one-time workloads.
Batch ScriptA shell script that defines how a batch job should run, including commands and requested resources.Like a written work order telling a crew exactly what to do and what tools they need.Makes compute jobs repeatable, consistent, and easier to automate.
Blackwell GPUNVIDIA's Blackwell generation of accelerated computing hardware referenced for self-service AI clusters.Like a newer, faster engine model in the same family of sports cars.Offers more performance for advanced model training and inference workloads.
BlueprintIn NVIDIA Blueprint, a packaged workflow combining multiple AI steps into a ready starting point.Like a meal kit that includes the ingredients and recipe for a complex dinner.Speeds up deployment by giving teams a preassembled workflow instead of starting from scratch.
Boot Disk ImageA prebuilt disk template used to start a virtual machine with a chosen operating system and software stack.Like buying a laptop with the operating system and core apps already installed.Cuts setup time and helps standardize environments across teams.
C
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Checkpoint NewA saved snapshot of a model's weights at a point during training so work can be resumed or the model version reused later.Like hitting "Save" in a video game so you don't lose hours of progress if something goes wrong.Protects expensive training runs from loss and enables teams to roll back to earlier, better-performing model versions.
CLICommand-Line Interface: a text-based way to create, inspect, and manage cloud resources.Like giving precise instructions at a drive-through speaker instead of tapping buttons on a kiosk.Useful for automation, scripting, repeatability, and fast power-user workflows.
ClusterA group of connected compute resources that work together as one system.Like a team of workers assigned to one big project instead of one person doing it all.Enables scale, resilience, and parallel processing for demanding AI jobs.
Context Window NewThe maximum number of tokens (words/pieces) a language model can read and consider at one time when generating a response.Like the width of a desk — it determines how many pages you can spread out and reference at once while working.Larger context windows let AI handle longer documents, conversations, and code files without losing information.
ContainerA lightweight package that includes an application and all the software it needs to run consistently across environments.Like packing a food order in a sealed box with everything needed inside.Improves portability, consistency, and deployment speed.
Containerized ApplicationAn application packaged and run inside a container.Like shipping a mobile coffee stand that arrives with the espresso machine and supplies already inside.Makes software easier to move between laptops, servers, and cloud platforms.
Controller NodeIn a Slurm/Soperator setup, the node that manages scheduling and orchestration of jobs.Like an air traffic controller directing planes to the right runways and gates.Coordinates work so jobs start in the right place at the right time.
CPU NewCentral Processing Unit: the general-purpose processor in a computer that handles most instructions and logic sequentially.Like a single brilliant chef who can cook anything but can only work on one dish at a time.Handles coordination, business logic, and lighter workloads; GPU handles the heavy parallel math for AI.
CUDANVIDIA's parallel computing platform and programming model for running workloads on GPUs.Like giving chefs a special kitchen layout so they can cook many dishes at once very efficiently.Lets developers accelerate training, inference, and scientific computing on NVIDIA GPUs.
D
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Data Lake NewA centralized repository that stores raw, unstructured, and structured data at any scale until it is needed for analysis or training.Like a massive warehouse where you toss everything you own — you can find and use anything later, but it takes some digging.Enables organizations to capture all data now and decide how to use it for AI or analytics later.
Data Pipeline NewAn automated series of steps that moves, transforms, and loads data from its source to its destination for use in AI or analytics.Like an assembly line that takes raw materials at one end and delivers finished, packaged products at the other.Ensures clean, timely data reaches training and inference systems without manual intervention.
Data PreprocessingCleaning, transforming, and organizing raw data before using it for training or inference.Like washing, peeling, and chopping ingredients before cooking.Improves model accuracy and reliability by feeding cleaner input into the pipeline.
Deployment NewThe process of releasing a trained model or application into a production environment where real users or systems can access it.Like opening a restaurant after months of recipe testing — the kitchen is finally serving real customers.Turns model development into business value by making AI accessible to end users or downstream systems.
Device PluginA Kubernetes component that exposes specialized hardware, such as GPUs, to workloads running in the cluster.Like a valet system that tells guests which specialty vehicles are actually available.Allows containers to use scarce hardware resources in a controlled way.
Distributed TrainingTraining a model across multiple GPUs or machines working together.Like several construction crews building different parts of the same stadium at the same time.Shortens training time for large models and datasets.
DockerA widely used container platform for building, packaging, and running containerized applications.Like a standard brand of shipping container that works on many trucks, ships, and ports.Makes application packaging and deployment more predictable and portable.
E
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Embedding NewA numerical vector that represents text, images, or other data in a way that captures meaning, so similar things end up near each other mathematically.Like converting book summaries into GPS coordinates — books with similar themes end up geographically close together on a map.Powers semantic search, recommendation engines, and retrieval-augmented AI by enabling similarity comparisons.
EndpointIn Serverless AI, an interactive service that listens for requests and returns model responses until stopped.Like a staffed service desk that stays open to answer incoming questions.Supports real-time AI experiences such as chat, search, and live predictions.
Epoch NewOne complete pass through the entire training dataset by the model during the learning process.Like reading a textbook from cover to cover once — multiple epochs means reading it multiple times to absorb more.More epochs improve model accuracy up to a point; too many cause overfitting, so monitoring epochs is essential.
F
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Fault ToleranceThe ability of a system to keep operating even when some parts fail.Like a restaurant staying open because another cook can step in when one calls out sick.Improves uptime and user trust by reducing outages.
Fine-TuningAdapting a pre-trained model to a narrower task or dataset by continuing training on new examples.Like taking a general athlete and coaching them for one specific sport.Improves relevance and quality for a company's specific use case.
Foundation Model NewA large model trained on broad data at scale that can be adapted to many downstream tasks through fine-tuning or prompting.Like a general-purpose Swiss Army knife — it does many things reasonably well out of the box.Reduces the cost of building AI capabilities by letting businesses start from a powerful, reusable base instead of training from scratch.
FrameworkA reusable software foundation that provides structure and common components for building applications.Like the frame of a house that gives builders a standard structure to work with.Speeds development and reduces custom reinvention.
G
TermOfficial DefinitionEasy AnalogyBusiness Purpose
GenAIGenerative AI: systems that create new content such as text, images, code, or audio.Like a creative assistant that drafts new material instead of only retrieving old files.Enables products like copilots, chatbots, content tools, and design assistants.
GlusterFSA distributed file system referenced by Nebius solution examples for shared storage across cluster nodes.Like a shared company filing room that many teams can access from different offices.Helps multiple machines read and write shared data needed by training jobs.
GPUGraphics Processing Unit: a processor built to handle many calculations in parallel, making it well suited for AI workloads.Like having hundreds of prep cooks working at once instead of one chef doing everything.Accelerates training, inference, and data processing workloads that would be too slow on general-purpose chips.
GPU ClusterA cluster of machines or nodes equipped with GPUs and connected to work together on heavy workloads.Like a fleet of tow trucks working the same major recovery job.Provides the scale needed for large model training and high-throughput inference.
GPU DriverSystem software that lets the operating system and applications communicate correctly with the GPU.Like the translator between the driver and a very specialized race car.Required to make GPU hardware usable and stable for AI workloads.
GrafanaA visualization tool used for dashboards that display metrics and system health.Like a control-room wall of gauges and screens showing how the factory is running.Helps teams monitor performance, spot anomalies, and troubleshoot faster.
Guardrails NewRules, filters, or checks placed around an AI model to prevent it from generating harmful, off-topic, or policy-violating outputs.Like the bumpers in bumper bowling — they guide the ball and stop it from going somewhere it shouldn't.Protects brand, users, and legal compliance by keeping AI outputs within acceptable boundaries.
H
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Hallucination NewWhen an AI model confidently generates text that sounds plausible but is factually wrong or entirely made up.Like a tour guide confidently describing landmarks that don't exist — spoken fluently and convincingly.One of the biggest risks in enterprise AI deployment; mitigated by grounding models in real data via RAG or fact-checking layers.
Helm NewA package manager for Kubernetes that bundles application definitions into reusable "charts" for easy deployment.Like an App Store for Kubernetes — instead of building everything from scratch, you install pre-packaged software bundles.Speeds up and standardizes how teams deploy complex applications and AI infrastructure components to Kubernetes.
High AvailabilityA design goal in which services are built to remain accessible with minimal downtime.Like keeping a spare generator ready so the lights stay on when the power fails.Protects revenue and operations by reducing service interruption.
Horovod NewAn open-source distributed deep learning training framework that helps scale model training across many GPUs.Like a relay race where each runner carries the baton for their leg — each GPU trains on part of the data and passes results to the others.Reduces training time for large models by efficiently coordinating work across dozens or hundreds of GPUs.
HPCHigh-Performance Computing: the use of powerful, often parallel systems to solve compute-intensive problems.Like using an industrial bakery instead of a home oven when you need ten thousand loaves.Supports large simulations, scientific workloads, and massive model training runs.
Hyperparameter NewA configuration value set before training begins — such as learning rate or batch size — that controls how the model learns.Like the temperature and baking time settings you choose before putting bread in the oven — they shape the result without being part of the dough.Tuning hyperparameters can dramatically improve model performance, and cloud platforms automate this search to save time.
I
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Infrastructure as Code (IaC) NewThe practice of defining and managing cloud infrastructure through machine-readable configuration files instead of manual setup.Like a detailed recipe that automatically instructs a robot kitchen to cook the same meal perfectly every time.Enables repeatable, auditable, version-controlled infrastructure deployments that reduce human error.
InferenceUsing a trained model to generate predictions or outputs from new input data.Like asking an experienced mechanic to identify a problem after years of training.Turns trained models into usable products and business decisions.
InfiniBandA high-throughput, low-latency networking technology used to connect systems for demanding compute workloads.Like replacing neighborhood roads with a private high-speed express lane between factories.Helps large GPU workloads exchange data faster, improving multi-node performance.
Interactive WorkloadA workload that stays running and responds to incoming requests in real time.Like a help desk agent who stays at the phone waiting for calls.Supports user-facing services where fast response matters.
J
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Job AllocationThe set of compute resources reserved by Slurm for a submitted job.Like booking a meeting room, projector, and staff before an event starts.Ensures the required resources are ready before the job runs.
Job Queue NewThe ordered list of submitted compute jobs waiting for resources to become available before they can begin running.Like a numbered ticket system at a deli — your job is assigned a number and gets called when it's its turn.Enables fair, priority-based sharing of expensive GPU resources across multiple teams or projects.
Job SchedulerSoftware that decides when and where compute jobs should run based on rules and available resources.Like a dispatcher assigning delivery routes to available drivers.Improves utilization, fairness, and throughput in shared compute environments.
Jupyter Notebook NewAn interactive web-based environment where data scientists and developers can write and run code, visualize results, and document their work in one place.Like a live science lab notebook where you can run experiments and record results on the same page.Accelerates AI experimentation and model prototyping, and makes analytical work shareable across teams.
K
TermOfficial DefinitionEasy AnalogyBusiness Purpose
kubectlThe native command-line tool used to work with Kubernetes clusters.Like the remote control for a very large machine room.Lets engineers inspect, update, and troubleshoot Kubernetes resources efficiently.
KubernetesAn orchestration platform for deploying, scaling, and managing containerized applications.Like an operations manager who decides where each shipping container goes and replaces broken trucks automatically.Helps companies run AI and application workloads reliably at scale.
KV Cache NewA memory optimization in transformer inference that stores previously computed key-value pairs to avoid redundant recalculation on each new token.Like a student keeping scratch work on the side of the page so they don't have to redo calculations every time they need the same number.Dramatically speeds up inference for long conversations and documents, reducing cost per query.
L
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Latency NewThe time it takes for a system to respond to a request — from when input is sent to when output is received.Like the wait between pressing the elevator button and the doors opening.Low latency is critical for interactive AI applications; high latency degrades user experience and revenue.
LlamaA family of large language models referenced in Nebius fine-tuning examples.Like a pre-trained writer you can coach for your specific brand voice.Serves as a base model for custom AI assistants and other language applications.
LLM NewLarge Language Model: an AI model trained on vast text data that can understand and generate human language for a wide variety of tasks.Like a very well-read colleague who has processed millions of books, articles, and conversations and can discuss almost any topic.Powers chatbots, copilots, summarization, coding assistants, and countless enterprise automation use cases.
Login NodeIn a Slurm cluster, a node that users access to submit jobs and work with the environment.Like the front desk and lobby where visitors arrive before going deeper into the building.Provides a controlled entry point for users and workflows.
LogQLA query language used for searching and analyzing logs in Loki-style systems.Like a detective's search grammar for finding the right clues in a mountain of notes.Helps teams quickly find errors, patterns, and operational signals in logs.
LokiA log aggregation system often used alongside Grafana for storing and querying logs.Like a searchable archive room for every machine and app message in the building.Makes it easier to investigate incidents and understand system behavior.
M
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Managed KubernetesNebius's managed service for deploying and operating Kubernetes clusters with less manual overhead.Like renting a serviced office where building management handles the plumbing and power.Lets teams focus on applications instead of running every part of the cluster themselves.
Microservice NewA software design pattern where an application is broken into small, independent services that communicate via APIs.Like a food hall where each stall specializes in one cuisine — each works independently but they all serve the same customers.Allows AI components like inference, preprocessing, and logging to be scaled, updated, and maintained independently.
MLflowAn open-source platform for managing machine learning experiments and model lifecycle tasks.Like a lab notebook mixed with a project tracker for model work.Improves experiment tracking, reproducibility, and collaboration.
ML/AI WorkloadA computing task related to machine learning or AI, such as training, inference, preprocessing, or evaluation.Like the collection of jobs in a movie studio: filming, editing, sound, and release.Helps teams categorize and optimize the kinds of compute work they run.
MLOps NewMachine Learning Operations: the practices and tools for deploying, monitoring, and maintaining ML models in production reliably.Like having a dedicated pit crew for race cars — they keep the car running, swap tires, and ensure it stays competitive lap after lap.Bridges the gap between model development and production, reducing the time and risk of getting AI into business systems.
Model EvaluationTesting a model to measure how well it performs on defined tasks or datasets.Like giving a student a final exam after months of study.Prevents weak models from reaching customers and supports governance.
Model Registry NewA centralized store where trained model versions are tracked, versioned, and managed across their lifecycle.Like a library catalog — each book (model) is catalogued with its edition, where it came from, and who checked it out.Ensures teams can trace model lineage, roll back to stable versions, and comply with AI governance requirements.
MonitoringCollecting and viewing metrics, events, and health data about infrastructure and applications.Like checking vital signs on a hospital monitor.Helps teams detect issues early and keep services reliable.
Multi-Host TrainingTraining that spans multiple machines rather than a single server.Like splitting a giant warehouse order across several fulfillment centers.Needed when one machine does not have enough compute or memory for the job.
Multi-Node WorkloadA job that runs across multiple nodes in a cluster.Like a film production spread across several stages but following one production plan.Allows larger or faster processing than a single node can provide.
N
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Namespace NewA virtual partition inside a Kubernetes cluster that isolates resources, allowing multiple teams or environments to share the same cluster safely.Like separate floors in a shared office building — different companies share the structure but have their own locked space.Enables safe multi-tenant cluster usage and helps enforce resource quotas and access controls per team.
Nebius AI Cloud NewA cloud platform purpose-built for AI workloads, offering GPU compute, managed Kubernetes, Slurm-based HPC, and AI-focused infrastructure services.Like a specialty workshop fully equipped for auto restoration — built specifically for that craft, not a general rental space.Provides AI and ML teams with optimized, cost-effective infrastructure without managing raw hardware.
NIM (NVIDIA) NewNVIDIA Inference Microservices: containerized, optimized model servers that make it easy to deploy AI models at production scale.Like a pre-assembled, road-tested food truck — the engine, kitchen, and menu are ready; you just choose where to park it.Speeds time-to-production for AI inference by providing optimized serving containers from NVIDIA.
NodeA single machine or compute instance that participates in a cluster.Like one employee on a larger team.Serves as a building block for scalable systems.
Node GroupA set of similar nodes managed together inside a Kubernetes cluster.Like a department made up of employees with the same role and equipment.Makes it easier to scale and manage groups of machines consistently.
NVLinkNVIDIA's high-speed interconnect for moving data efficiently between GPUs.Like a private hallway between offices instead of sending everything through the public street.Improves performance when multiple GPUs need to share data rapidly.
O
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Object Storage NewA type of cloud storage that keeps data as discrete objects (files + metadata) in a flat namespace, ideal for large unstructured data like datasets and model artifacts.Like a massive self-storage facility where each unit is labeled with a unique number — you can retrieve any item instantly if you know its number.Cost-effective way to store huge training datasets, model checkpoints, and output files at scale.
ObservabilityThe practice of understanding system health by using metrics, logs, traces, and related signals.Like not just checking that a car is on, but also having the dashboard, service log, and engine diagnostics.Speeds troubleshooting and improves operational reliability.
OperatorIn Kubernetes, software that automates the deployment and lifecycle management of complex applications.Like a specialist caretaker who knows how to install, tune, heal, and upgrade a specific machine.Reduces manual administration and enforces best practices.
OrchestrationCoordinating many compute resources and workloads so they run in the right place and order.Like a conductor keeping an orchestra in sync.Essential for scaling AI systems efficiently and reliably.
Overfitting NewWhen a model learns training data too precisely — including its noise — and fails to generalize to new, unseen data.Like a student who memorizes every practice test answer verbatim but can't solve a slightly rephrased question on the real exam.A common AI quality failure; avoided with techniques like regularization, dropout, and proper data splitting.
P
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Parallel ComputingUsing many processors or cores at the same time to solve parts of a problem together.Like having many people assemble different sections of the same puzzle simultaneously.Cuts runtime for large workloads and makes advanced AI feasible.
Parameter NewA single numerical value inside a model that is learned during training; models are often described by their parameter count (e.g., 7 billion).Like individual adjustable screws inside a complex machine — each one affects how the machine behaves, and training tunes them all.Parameter count signals model capability and resource requirements; more parameters generally means more power but higher cost.
PodThe basic deployable unit in Kubernetes that runs one or more tightly coupled containers.Like a small work booth that can hold one specialist or a tiny team that must travel together.Provides a practical unit for scheduling, scaling, and managing applications.
PostgreSQLAn open-source relational database system.Like a very organized digital filing cabinet with strong rules for storing and finding records.Supports applications that need reliable structured data storage.
Pretraining NewThe initial phase of training a foundation model on a massive, broad dataset to give it general knowledge and language understanding.Like putting a new hire through a comprehensive multi-year university education before they start their specialized role.Creates the general-purpose capability that makes models like LLMs useful across many tasks without task-specific data.
ProjectIn Nebius, a resource boundary used to organize and control related cloud resources.Like a labeled folder that keeps one department's budget, assets, and paperwork together.Helps with organization, access control, and billing separation.
PrometheusA monitoring system that collects and stores metrics from infrastructure and applications.Like an automated clipboard that records performance readings at regular intervals.Gives teams the data needed for dashboards, alerting, and capacity planning.
Prompt NewThe input text or instruction given to an AI model that tells it what task to perform or question to answer.Like giving a briefing to a contractor — the clearer and more detailed your instructions, the better the outcome.Prompt quality directly affects AI output quality; well-crafted prompts are a core enterprise skill for AI adoption.
ProvisioningCreating and configuring infrastructure resources so they are ready to use.Like setting up desks, laptops, and badges before a new team arrives.Speeds onboarding of systems and supports automation.
PyTorch NewAn open-source deep learning framework widely used for model research and production training, especially in AI labs.Like a professional woodworking kit — flexible, precise, and favored by craftspeople who need full control.The dominant framework for training state-of-the-art AI models; knowing its infrastructure requirements is key for cloud planning.
Q
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Quantization NewA technique that reduces the numerical precision of a model's weights to make it smaller and faster, with minimal accuracy loss.Like compressing a high-resolution photo to a smaller file size — it looks almost the same but takes up far less space.Allows large models to run on less expensive hardware or edge devices, dramatically reducing inference cost.
Queue DepthThe number of requests or tasks waiting to be processed.Like the number of people lined up at a coffee shop.A useful signal for scaling systems and detecting bottlenecks.
R
TermOfficial DefinitionEasy AnalogyBusiness Purpose
RAG NewRetrieval-Augmented Generation: a pattern where an AI model retrieves relevant external documents at query time and uses them to ground its response.Like an open-book exam — instead of relying only on memory, the model is allowed to look things up before answering.Reduces hallucination and keeps AI responses accurate and up-to-date without expensive retraining.
Real-Time InferenceProducing model outputs immediately in response to live requests.Like a translator speaking right as someone talks.Important for chatbots, fraud checks, recommendations, and interactive apps.
RegionA geographic cloud location where resources run.Like choosing which city to open a branch office in.Affects latency, compliance, disaster planning, and service availability.
RLHF NewReinforcement Learning from Human Feedback: a technique for training AI models to produce outputs that align with human preferences using rated example responses.Like coaching an employee by having managers rate their work and rewarding them when they get it right.The key method used to make LLMs helpful, harmless, and honest — critical for safe enterprise AI deployment.
Rollout Strategy NewThe plan for releasing a new model or service version to users — such as gradual canary releases or full blue/green switches.Like a restaurant testing a new menu item with just 10% of tables before rolling it out everywhere.Minimizes risk by catching problems with a small user group before they affect everyone.
S
TermOfficial DefinitionEasy AnalogyBusiness Purpose
SbatchA Slurm command used to submit a batch script for execution.Like handing a completed work order to the dispatcher.Standardizes how jobs are submitted to the scheduler.
SchedulingThe process of assigning jobs to available resources over time.Like deciding which meeting gets which room and when.Keeps shared infrastructure efficient and fair.
Serverless AINebius's service model for running AI endpoints or jobs without managing the underlying servers directly.Like catering a meal without renting and staffing the kitchen yourself.Speeds delivery and reduces operational overhead for AI teams.
Service AccountA non-human identity used by applications or automation to access cloud resources.Like a company badge issued to a robot worker instead of a person.Improves security and automation by separating app permissions from user accounts.
Shared FilesystemA storage system that multiple machines can access as a common file space.Like a shared drive everyone on a team can open.Important for training data, checkpoints, and collaboration across nodes.
Shared Responsibility ModelA cloud security principle where the provider secures some layers and the customer secures others.Like a landlord handling the building structure while the tenant locks their own office.Clarifies who is responsible for security, compliance, and operations.
SkyPilot NewAn open-source framework referenced by Nebius that automates launching AI and ML jobs across cloud providers and Kubernetes clusters.Like a universal flight booking agent that finds the cheapest and fastest available flight regardless of airline.Helps organizations minimize compute cost and maximize GPU availability across cloud platforms.
SlurmAn open-source workload manager and job scheduler widely used for high-performance computing and AI training.Like a factory foreman assigning heavy jobs to the right crews and machines.Helps organizations run large batch and training workloads efficiently.
SoperatorNebius's open-source Kubernetes operator that runs Slurm nodes as Kubernetes Pods, combining Slurm and Kubernetes in one infrastructure.Like mounting a factory foreman inside a modern operations control center so both work together.Brings familiar Slurm job workflows to Kubernetes-based infrastructure.
SrunA Slurm command used to launch tasks or job steps inside an allocated job.Like telling individual crew members to start their part of the work after the site is reserved.Supports multi-step and multi-node execution inside scheduled jobs.
System Prompt NewA set of instructions given to an AI model at the start of a session that defines its role, behavior, and constraints throughout the conversation.Like a job description and employee handbook given to a new hire on day one — it shapes how they respond to everything.Lets businesses customize AI behavior for specific use cases like customer service, coding help, or legal review.
T
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Temperature NewA parameter that controls how random or creative an AI model's outputs are — low temperature means predictable, high temperature means more varied.Like a dial on a blender — low speed gives controlled, consistent results; high speed mixes things up wildly.Lets businesses tune AI behavior: low temperature for factual tasks, higher for brainstorming and creative work.
Tensor NewA multi-dimensional array used as the fundamental data structure in deep learning to represent everything from images to text embeddings.Like a spreadsheet that can have many dimensions — not just rows and columns, but stacks of pages deep.All data flowing through a neural network is a tensor; understanding tensors is foundational to AI infrastructure planning.
TerraformAn infrastructure-as-code tool used to define and provision cloud resources from configuration files.Like an architect's blueprint that can automatically assemble the building crew's work orders.Improves repeatability, version control, and automation for infrastructure.
Throughput NewThe number of requests, tasks, or tokens a system can process per unit of time.Like how many cars per hour a toll booth can process — it defines the capacity of the system.Higher throughput means more AI queries served at lower cost; critical for high-volume production deployments.
Token NewThe basic unit of text that an AI model reads and generates — typically a word or part of a word; models process and are priced by token count.Like Scrabble tiles — text is broken into individual tiles before the model "plays" with them.Token count drives inference cost and context window limits; understanding tokens helps budget AI API usage.
TrainingThe process of teaching a model by adjusting it using data so it can perform a task.Like coaching a new employee through many examples until they get good at the job.Creates the model capability that later powers AI products.
Transformer NewThe dominant neural network architecture behind modern LLMs and vision models, based on the attention mechanism to relate all parts of an input to each other.Like a team of editors who each read the entire document and highlight how every sentence relates to every other sentence.Understanding the transformer architecture is foundational to AI infrastructure decisions, as it drives GPU memory and compute requirements.
TPU NewTensor Processing Unit: Google's custom ASIC chip designed specifically to accelerate machine learning workloads, particularly matrix operations in neural networks.Like a specialized espresso machine versus a general kitchen stove — purpose-built to do one thing exceptionally fast.Offers an alternative to GPUs for large-scale AI training, with different cost/performance tradeoffs depending on workload.
U
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Utilization NewThe percentage of available compute resources (GPU, CPU, memory) actively being used at any moment.Like how many seats in a restaurant are filled at dinner — low utilization means you're paying for empty tables.High utilization means better return on expensive GPU infrastructure; monitoring utilization is key to cost optimization.
V
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Vector Database NewA database designed to store and search vector embeddings efficiently, enabling fast similarity lookups across millions of items.Like a music app that can find songs that "sound similar" to one you like — it's searching by feel, not by exact keyword.Powers semantic search, RAG systems, and recommendation engines by enabling AI to find conceptually related content instantly.
Virtual Machine (VM)A software-defined computer that runs its own operating system on shared physical hardware.Like renting an apartment in a larger building: it feels like your own place even though the structure is shared.Provides flexible isolated compute for apps, data jobs, and AI workloads.
vLLM NewAn open-source high-throughput LLM serving engine that uses paged attention to dramatically increase GPU utilization during inference.Like a highly efficient hotel concierge who handles many guests simultaneously instead of helping one at a time.Reduces inference cost per query significantly and is widely used in production LLM deployments on platforms like Nebius.
W
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Worker NodeA node that actually runs application workloads or Slurm jobs.Like the production floor where the real manufacturing happens.Provides the compute capacity that does the work.
WorkflowAn ordered sequence of steps that together complete a task or process.Like a recipe with stages from prep to cooking to plating.Helps teams standardize complex AI and data operations.
Workload ManagerSoftware category that manages compute jobs and resource usage; Nebius discusses Slurm and Kubernetes in this role.Like a dispatcher coordinating people, rooms, and equipment for many projects.Improves utilization, scheduling, and operational control in shared environments.
X
TermOfficial DefinitionEasy AnalogyBusiness Purpose
XGBoost NewAn optimized gradient-boosting framework for training fast, accurate models on tabular (spreadsheet-style) data.Like having a committee of specialists vote on each decision — their combined judgment is more accurate than any single expert.Widely used for structured data predictions like fraud detection, churn, and pricing — often outperforms deep learning for tabular data.
Y
TermOfficial DefinitionEasy AnalogyBusiness Purpose
YAML NewA human-readable configuration file format used extensively in Kubernetes, CI/CD pipelines, and infrastructure-as-code to define how systems should behave.Like a plain-English instruction manual for machines — written in a way both humans and computers can understand.The de facto format for configuring AI infrastructure in Kubernetes; every DevOps and MLOps team works with YAML daily.
Z
TermOfficial DefinitionEasy AnalogyBusiness Purpose
Zero-Downtime Deployment NewA deployment strategy that updates a running service to a new version without any interruption in service for users.Like repaving a highway by shifting traffic to one lane — the road keeps functioning while the work gets done.Critical for production AI services where downtime means lost revenue and degraded user trust.
Zero-Shot Learning NewThe ability of an AI model to perform a task it was never explicitly trained on, using only a description or prompt at inference time.Like asking a multilingual professor to grade a paper in a language they've never formally studied — they use their general knowledge to figure it out.Enables rapid prototyping of AI capabilities without needing labeled training data for every new task.
← Back to Nebius Projects